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Abstract 

In this paper we consider a fractional stochastic volatility model, that is a model 
in which the volatility may exhibit a long-range dependent or a rough/antipersistent 
behavior. We propose a dynamic sequential Monte Carlo methodology that is applicable 
to both long memory and antipersistent processes in order to estimate the volatility as 
well as the unknown parameters of the model. We establish a central limit theorem 
for the state and parameter filters and we study asymptotic properties (consistency 
and asymptotic normality) for the filter. We illustrate our results with a simulation 
study and we apply our method to estimating the volatility and the parameters of a 
long-range dependent model for S& P 500 data. 

Keywords: long memory stochastic volatility, rough stochastic volatility, parameter estima¬ 
tion, particle filtering. 

1 Introduction 

Empirical studies show that the volatility may exhibit correlations of the squared log returns 
that decay at a hyperbolic rate, instead of an exponential rate as the lag increases, see for 
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example [6], [16]. This slow decay cannot be explained by a GARCH stochastic volatility 
model or by a stochastic volatility model with jumps. In the literature, this behavior has 
been described by a class of models that exhibit long-range dependence in the volatility. 
In contrast to the models where the dependence is introduced in the stock returns, |32j . 
the standard assumption of absence of arbitrage opportunities is preserved. The first long 
memory stochastic volatility (LMSV) model was introduced in discrete time by |6] and |26j . 
In particular, the authors model the stock returns by: 

{ n = (Jt 

at = a exp {ut} , where (1 - B^ut = rjt 

where a > 0, et are independent and identically distributed (iid) A^(0,1), rjt are iid ^"(0, 
and independent with et and d G (0,1/2) is the memory parameter. The feature of long- 
memory here stems from the fractional difference (1—where B denotes the lag operator, 
i.e. BXt = Xt-i. 

The continuous analogue of the LMSV model was introduced by Comte and Renault in 
|10j as a continuous-time mean reverting process in the Hull-White setting. More specifi¬ 
cally, their model resembles a classical stochastic volatility model in which the log returns 
follow a Geometric Brownian Motion , however, the volatility process is described by a frac¬ 
tional Ornstein-Uhlenbeck process; that is the standard Ornstein-Uhlenbeck process where 
the Brownian motion is replaced by a fractional Brownian motion. 

A large number of recent papers have considered modeling the volatility in terms of long- 
range dependent or antipersistent processes. In m , the authors propose an affine version of 
the long memory model in uni for option pricing and they argue that long-range dependence 
provides an explanation for observations of non-flat term structure in long maturities. In 
0, 0, the authors propose a pricing tree algorithm in order to compute option prices in 
the discrete and continuous time frameworks. They also propose a calibration method for 
determining the memory parameter. In the authors propose a rough fractional 

stochastic volatility model, which is used to explain strong skews or smiles in the implied 
volatility surface for very short time maturities. In a more recent paper, m, the authors 
discuss the case of small volatility fluctuations of both short and long memory models and 
their impact on the implied volatility and derive a corrected Black-Scholes formula. 

In this article, we address two problems: filtering of the unobserved volatility and pa¬ 
rameter estimation in the case of a stochastic volatility model that is either rough or antiper¬ 
sistent. Specihcally, we adapt a Sequential Monte Carlo algorithm in the non-Markovian 
framework and then we estimate the parameters online, following an idea initially intro¬ 
duced by [30] . 

Sequential Monte Carlo (SMC) methods, also known as particle filters or recursive 
Monte Carlo filters, arose from the seminal work of Gordon, Salmond and Smith (1993), 
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|22j . SMC methods are iterative algorithms that are based on the dynamic evolution and 
update of discrete sets of sampled state vectors, that are referred to as particles, and are 
associated with properly selected weights. For an introduction to the SMC literature, there 
are several reviews and tutorials, including, the edited volume by Doucet, Freitas and 
Gordon (2001), [T7] and Kantas et al. (2010), [28] , 

In order to keep the adaptive nature of the SMC algorithm, we incorporate the parame¬ 
ter estimation procedure in the algorithm by treating the parameters as artificially random, 
sampled from a kernel density. Generally speaking, the approach relies on augmenting the 
unobserved state by considering the parameter as an unobserved state. We do not assume 
that the unknown parameter is fully random, and thus we do not have an additional MCMC 
step in the algorithm. We emphasize here that this is an important point, due to the heavy 
computational overhead caused by the long-memory issue. Also, as it is well known, us¬ 
ing SMC for maximum likelihood based methods for estimating parameters like the Hurst 
parameter is difficult. 

The output of the algorithm is a provably asymptotically consistent estimator along 
with the corresponding variance. One of the appeals of the proposed method is the simplic¬ 
ity in its application, which becomes in particularly important due to the memory issue. 
However, we do mention here that overdispersion due to the artificial random evolution of 
the parameter is an issue that comes up with such methods, see for example m- But, as 
we shall see, the overdispersion that the artificial random evolution of the parameter in¬ 
troduces, can be minimized by properly tuning the parameters of the mixture distribution 
used for sampling the parameter. Related works using different methods can be found in 

mm- 

The study of the asymptotic properties of SMC methods and the asymptotic properties 
of the filter as the number of particle increases is a quite hard problem. A form of consistency 
of the filter, when the number of particles tends to infinity is a common result in the majority 
of the literature, while central limit theorem type of results are fewer. One can refer, for 
example, to the work of Del Moral and Guionnet (1999), |15) . To the best of our knowledge, 
the most general results in the literature can be found in Chopin (2004), |12j . Doric and 
Moulines (2008), [l8], and Kiinsch (2005), [29] . In these works and under slightly different 
conditions, law of large numbers and central limit theorems are derived that apply to most 
sequential Monte Carlo techniques in the literature. 

The rest of the paper is organized as follows: In Section 2, we introduce the math¬ 
ematical formulation of the problem. In Section 3, we introduce the SISR (Sequential 
Importance Sampling with Resampling) method with parameter learning and we present 
the theoretical results for the proposed parameter estimators. In Section 4, we study the 
performance of both methods using simulated data. In Section 5, we apply our method 
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in estimating the unobserved volatility of a discrete-time stochastic volatility model with 
long-range dependence for S&: P 500 data. Finally, we summarize our results in Section 6. 


2 Mathematical Framework 


Consider a state-space model in which the state vector is denoted by {Xt}t>i and the 
observations {Yt}t>i are obtained sequentially in time. In addition, we assume that the 
state vector depends on an unknown, but fixed, parameter vector that we denote by 9. 
In the sequel, we use the notation X oi Y for the random variables and x or y for the 
corresponding realized values. 

Unlike other models in the literature, we do not assume that the state vector {Xt}t>i is 
a Markovian process. Instead, we consider the case in which the unobserved process is not 
necessarily Markovian, with particular interest in the long-range dependent case. Formally, 
long-range dependence or long-memory is defined as follows: 


Definition 1. For a stationary process {Xt}t>i, there exists a parameter (Hurst index) 
i/ G (^, 1), such that 


lim 

t^OO 


Corr{Xt,Xi) 

ct2-2H 


= 1 , 


( 1 ) 


where p{h) := Corr{Xt, Xt-h) is the autocorrelation funetion of the proeess. 


When H = ^, then the process is Markovian, so this is a generalization of the models 
that are treated in the SMC literature. Equivalently, long-range dependence implies that 
the autocorrelation function p(h) of a long-range dependent process is non-summable, that 
is P{^) = CO- If the auto-correlation function is summable, then the process has what 

is called antipersistence, in which case H G (0, ^). 

Formally, at time t, the state-space model is specified by the observation equation that 
is determined by the observation density 


P{yt\xu6) 

and the state equation given by the conditional density 


p{xt\xt-i,... ,xi;9), 

where 0 G 0 is an unknown vector of parameters, and 0 C is open. We assume that the 
observations {Yt}t>i are conditionally independent given {Xt}t>i and that the long-range 
dependent process {Xt}t>i has known initial density Xi h{x;9). 

In this article, our goal is to use simulation for online filtering. In other words, we want 
to learn about the current state Xt given available information up to time t, which reduces 
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to estimating the probability distribution function 

p{xt\yi, ...,yt;0), t = l,...,n, 

where {yt}t>i are the observations up to time t. However, since we assume that the pa¬ 
rameter 9 is unknown, at the same time we also want to estimate 6. 


3 Sequential Monte Carlo Filtering 


As we mentioned above, apart from filtering for the unobserved states we also want to 
estimate the unknown parameter vector 9 on which the state vector depends. Our approach 
will be to consider 9 as an additional state and thus our goal will be to estimate the posterior 
distribution {p{xi-,t,9\yi-,t)}t>i given by 


pixi:t-,9\yi-t) oc p{xi:t,yi:t,9) 

oc p{xi) ■ p{x 2 \xi; 9) 


■p{xn\xn-i, ...,xi;9) ■Y\p{yi\xi]9) ■p{9), (2) 


2=1 


where p{9\yt) is a prior density for the parameter vector 9, see Subsection 3.1 If the 
parameter is known, then the density is degenerate. Therefore, the additional difficulty 
here is that we need to compute or approximate the theoretical density function p{9\yt). 


3.1 On-line Parameter Estimation 

One approach in the literature ([H], [22], jM], [30|), is to consider that 9 is not fixed and 
assume that it artihcially evolves in time, for example 

9t = 9t-i + et 

where e* is an artihcial white noise with decreasing variance. Then, at each time t, p{9\yt) 
will be updated inside the SISR algorithm in order to incorporate the additional information 
that is obtained. 

As it was discussed in m, this approach leads to artificial variance inflation, since the 
parameter is not truly random. However, the use of a kernel density estimate with shrinkage 
correction can control this artihcial over-dispersion. 

More specihcally, standing at time t, we approximate p{9\yt) by a set of samples 
and weights uj^ using a discrete Monte Carlo. The index t in 0 is in parenthesis to indicate 
that 9 does not evolve in time, but that its value is drawn from the posterior density p{9\yt) 
at time t. Then, the smooth kernel density with shrinkage correction will be of the form 

N 

P{G\yt) ~ , 

i=i 
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where AA(-|m, S) denotes a multivariate Normal density with mean m and variance S. So, 
essentially, p{0\yt) is approximated by a mixture of normals with mean and variance 
h‘^Vt, weighted by sample weights The kernel location is specified by 

= a + {1 — a)6t, 


where a = y/l — and 6t denotes the average over all parameter samples (essentially a 
sum over i). Regarding h, a typical choice would be a decreasing function of the sample 
size, but if one wants to control the loss of information then h? = q — ((3(5 — l)/2(5)^, where 
(5 is a discount factor typically around 0.95 — 0.99 and a becomes a = (3(5 — l)/2(5. 

Therefore, the SISR algorithm is adjusted in order to incorporate the update of 9. 
The key idea of our approach that also allows us to establish asymptotic consistency and 
normality of the estimators, is to re-formulate the weights so that they represent the joint 
posterior p update 9 along with the state vector X. 

Before presenting the algorithm, let us define the un-normalized weight functions. For 
i = 1,..., N, and t = 1 set 


m (v'j; ej;> 


P 




2/1 


Pi 


qi 


X 


d) 


1,1 

(d 


Pi 

id) 


id) 

(1) 


oc 


i{d 

di) 




"^l,ll^(l) 


p(</i|vi:i.<ii 


qi 


X 


Pi 


p(xr>m P{yi x[l9, 


^(d fl{d 

di) 


qi 


X 


(3) 


Generally for t > 1 


„d) 


i(d 

dd 






0 I 


xy X- 




g(0 

yt) 


(4) 


Then, the algorithm is given by: 

At time t = 1 

(a) Sampling 

For / = 1,..., A, sample X^\ ~ ( 7 i(-) and ~ Pi(')- 
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(b) Re-Sampling 

For i = 1,..., A^, let be defined by (3), normalize 

such that J2^=i 1 re-sample 


(i) 

"l 




N 


^ ^(M) 


At time t,t > 2 (step t — 1 ^ t) 


(a) Sampling 

For f = 1,..., A, set 


yi^) _ vb) 


mt -1 = + (1 - a)6»(i_i) 


where = Eili sample 




W 1^21 


and 


1 f Tr/(*) «(*) 


where F_i = ^ Ei=i ( ^i-i^(Ei) “ ^(*-1) 


y(V ^ \ yiV . o(V 

2 


(b) Re-Sampling 

For i = 1,..., A, let wf^ = wt be defined by (4), and normal 


ize _ 


such that Eili = 1 


For f = 1,..., A, re-sample 


N 


~ vrf (dxnt), where Trf (dxi-.t) = y~] w}^’d g) ^u){dxv.t). 

i=i 


and set 


N 


Sm = E F’o,':! 


i=l 


Output The filtering distribution p{dxi:t\yi-.t) is approximated by 
N N 

TT^{dxi.,t) = E ^^{dxi.,t) = ^ E 

i=i ^ i=i 
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and the estimator for 9 is We also record the approximation for the combined distri¬ 
bution 7 r®(dxi:t, which is approximated by 


N 


Nfi 


TT 




i=i 


0 ) 


^ 0 ) M){dx\-,td9u\). 

’^(t) 


3.2 Convergence Results 


Let us now study the convergence properties of this algorithm. Let (/> : x 0 i—)■ M be 

an appropriate test function. Notice now that the SISR algorithm provides us with the 
estimator 


If 




N 




2 = 1 




It is relatively straightforward to see that (j)^ is estimating 


(pt 




where is the value that is drawn from the posterior density p{9\yt) at time t. 

So, it is natural to quantify the performance of the algorithm by studying the conver¬ 
gence of (j)^ to as —)• oo. We define the set of appropriate test functions under which a 
central limit theorem can be established, appropriately formulated for our case of interest: 

^>4 = |0 : df !->■ M measurable : there exists 6 > 0 such that ||VL 40 ||^'''‘^ < oo, 

and xi.( 4 _i) is in $ 4 _i| . (5) 

Following the proof of the central limit theorem results of iniEij for the Markovian case, 
without the parameter estimation aspect, the following result is derived. 

r) I C 

Proposition 1. Let us assume that there exists d > 0 such that for every t < oo E^e llbFiH < 
oo and consider 0 G <I' 4 . Then, we get 

(<^f - ^ (0> (</’)) 

as N ^ oo, where at time t = 1 
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and for t > 1 

t-i 


+ E 


qi{xi)pi ( 0 ( 1 )) 

P^{xi:k,0{k)\yi:t) 


+ 


^_2 •J P(®l:(fc—1) |2/1:(A:—1)) ^{k))qk{xk^ 0(A:) l®l:(fc—1)) 

X {^j f>{xix,e(t))p{xi^k+i)a,^{t)\y{k+i)a,xi,k)dx(^k+i):td9{t) - f>t] dxi-.kdB^k) 


0{t)) - (l)tY dxi-.tdOit). 


p(a;i:(t-i)|yi:(t-i);0(t))gt(xt,0(t)|xi.(i_i)) 

Proof. The proof follows from Proposition A.1.1 of m after making the adequate identifi¬ 
cations. Indeed, for a general sequential importance sampling algorithm with weights Wt, 
Proposition A. 1.1 of m implies that the formula for the variance in question is given by 
t-i 


Wk E e [cft\Xi,k] -^t) + Var^, [Wt{^{xi..t, 9^t)) - ^t)] ■ 


yM) = ^Varp, 

k=l 

In our case we have 

Tr^{dxi:t, d9(^t)) = p{dxi-.t, d9i^t)\yv.t) 
pt{dxi:t,d9(^t)) = P{dxi:(^t-i)\yi:{t-i))qt{dxt,d9(^t)\xi,(^t-i)) 
and the weights take the form 

P{xi:t,9(^t)\yi.,t) 


dirf 


0(q) , (xi:i,0(i)) , . \ f Q I ^ 

Plugging these expressions in the formula for one immediately recovers the form of 


a'^{4>t), completing the proof of the proposition. 


□ 


3.3 Statistical properties of the parameter estimator 

Essentially, 9 is viewed as an augmented state variable. Proposition quantifies the con¬ 
vergence of the filter, but it does not discuss the statistical properties of 9^^. Let us recall 


that 


where 


N 


qN _ \ ^ 


2=1 


m[^i = + (1 - a)^(I-i) 


yN _ 


1 ^ 

2=1 


(d o{N,i) m 
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By inspecting the algorithm it becomes clear that the convergence properties of as 
—> oo is described by a statement very similar to that of Proposition after making 
the choice cj){xi-t,9(^t)) = ^(t)- Propositionshows that at time t, the estimator for 9, 9^^^'^ 
converges to 9(^t) = / 9p{9(^t)\yi-.t)d9 as N ^ oo. 

Proposition 2. Let us assume that there exists 5 > 0 such that for every t < oo IlkPtH < 
oo and consider the function identity i—)• 9(1-^, assuming that it belongs to the set of ap¬ 

propriate test functions defined in 0- Let us also define the mean of the posterior 
distribution p{9(^t)\yi:t) 

9(t) = 1 9p{9(^t)\Yi:t)d9. 

Then, we get 

as N ^ oo. The asymptotic variance cr'^{9(^t)) is defined as follows. At time t = 1 

and for t > 1 


<y^{e) = 


P^{xi,9(i)\yi.,t) 


+ E 


qi{xi)pi (6»(i)) 

P^{xi:k,9(k)\yi:t) 


9{t)P{x2-.t, 9(t)\y2-.u Xi)dx2:td9^t) - 6»(t) 1 dxid9^i) 


+ 


j^_2 •' l:(fc—1) \yi'.{k—l )) ^(fc) {Xk ) ^(fc) 1)) 

f p‘^{xi:t,9(^t)\yi-.t) 


p{xi:(t-l)\yi-.(t-l), 9(t))qt{xt, 9(^t)\xi:(t-l)) 


{9(t)ixi:t) - 9(^t)) dxi.,td9^ty 


( 6 ) 


Proof. This proposition is essentially a special case of Propositionwith 4>{xi-t, 9yj) = 9yy 
We notice that 


9{t) = J 0(t)P{xi:t,9y)\yi.,t)dxi-,td9y) 

= j 9(t)Pixi:t\9(t),Yl:t)p{9(t)\yi:t)dxl-td9y~j 
= j 9p{9y)\yi.,t)d9 

which is the mean of the posterior distribution p(0(t)|?/i:t), as claimed. □ 

The next natural question to ask is whether 9y^ is a consistent estimator of 0 as t —?• oo. 
Let us recall from Subsection 3.1 that choosing 5 around 0.95 — 0.99 implies for the tuning 
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parameters {a,h) ^ (1,0). Subsequently, this means, for every fixed t > 0, that the 
mean ~ and the variance h'^Vt ~ 0, which then implies that for every t > 0, 

Rs Notice that if the tuning parameters {a,h) = (1,0) then the distribution 

p{6(t)\yi-.t) coincides with the posterior distribution p{0\yi:t), as the parameter does not 
involve artihcially in time any more. Hence, in this case, by Doob’s consistency theorem, see 
for example Theorem 10.10 in [33], we would have that the sequence of posterior measures 
consistent under 0 if the model is identifiable, i.e., if IPei / / ^ 2 - In other 

words, in such a case, for every prior probability measure H on 0 the sequence of posterior 
measures is consistent for H—almost every 0. However, in the algorithm and for the 

purposes of dealing with the issue of degeneracy of the particles, we artificially evolve the 
unknown parameter which means that we take the tuning parameters to be (a, h) ~ (1, 0) 
but not exactly equal to (1,0). As we shall see from the simulation studies of Section]^ this 
is sufficient to guarantee that the parameter is being consistently estimated, as expected. 

4 Simulation Results 

4.1 Fractional ARIMA process 

The fractional ARIMA (AutoRegressive Integrated Moving Average) process was proposed 
by Box and Jenkins, [5], in 1970 and has been very popular in applied time series. A 
fractional ARIMA(p, d, q) process is formally dehned as follows (due to Granger and Joyeux, 

1231): 

Definition 2. Let ip{-) and ??(•) be polynomials of orders p and q respectively and Xt a 
stationary process such that 

d G (—1/2,1/2) and and {r]t)t>o is a sequence of iid variables with mean 0 and varianee 1. 
Then, the proeess {Xt}t>o is called a fractional ARIMA(p,d,q) process. 

In contrast to the classical ARIMA(p, d, q) process, where the parameter d is an integer, 
in the fractional case d is a real valued parameter with values between (—1/2,1/2). It is 
called the fractional integration parameter and is related to the Hurst index, id in Q, via 
d = H — B denotes the lag or backshift operator, and 

where the sum is taken over an infinite number of indices. The fractional ARIMA process 
is long-range dependent when d > 0, while the upper bound on d is needed to ensure that 
the process is stationary. More details regarding these models can be found in Beran |3|. 
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In our framework, we consider a state-space model in which the unobserved process 
is modeled by a Fractional ARIMA(p, d, q) process. Specifically, the state-space model is 
defined as follows 

[ ^{B) (1 - B)^ Xt ='diB) r,t, 

where and r]t are two independent iid sequences of Gaussian random variables and a{-) 
is a known function. 


4.1.1 SISR for Fractional ARIMA process with known parameter 

We apply our algorithm to simulated data from an ARIMA(1, 0.3, 0) model with parameter 
if = 0.8. That is 

f Yt = \Xt\ et 

\ t \ t 

[ (1 - if B) {1 - B)<^ Xt = Tit, 

The simulated model is shown in Figure [^a) and the estimated filter using the SISR 
algorithm is depicted in Figure [^b). We choose as tuning parameters 6 = 0.98 which then 
implies {a,h) = (0.9506,0.096). 








(a) Simulated Model 


(b) SISR Filter 


Figure 1: SISR filter for the fractional ARIMA(1, 0.3, 0) model. Here, the parameter ip is 
assumed to be known, (/? = 0.8. 


From the SISR plot (Figure]^), we can see that the approximation of the unobserved 
process (solid line) follows closely the true process (dashed line). In addition, the true data, 
all lie within the 95% confidence interval (dotted lines) estimated using the SISR algorithm. 
In our simulation study, we tried different values of d and the results are similar. 
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4.1.2 SISR for Fractional ARIMA process with unknown parameter 

Consider again model Q, but now assume that the parameter ip is unknown. Our goal is 
to estimate ip using the SISR algorithm. The results are summarized in Figures and 
From Figure]^ we can see that the approximation of the unobserved process remains good, 
and the 95% confidence interval still captures the process. 



Time 


Figure 2: SISR filter for the ARIMA(1, 0.3, 0) model. In this case, the parameter ip is 
assumed to be unknown and is estimated from the algorithm. 

In Figure we investigate the convergence of the parameter to the true value. The 
estimated parameter p is slightly noisy, but it converges to the true value 0.8. The difference 
in the two graphs in Figure is that the second one has a significantly larger number of 
simulated particles, and it seems that this slightly improves the smoothness of the curve. 

In Figure we compare the empirical variance of the estimator for N = 500 versus 
N = 2500 across all t G [1,1000]. It is clear that the variance decreases as N increases, but 
at the same time the variance does increase over time. The latter is consistent with the 
theoretical limiting variance as obtained in 

5 Application to S&P 500 Data 

In this section, we apply our method to real data. As an example, we are working with a 
long-range dependent state-space model in finance. The observed process are the returns of 
the underlying asset (S&: P 500 index to be precise) and the unobserved process is the asset’s 
volatility. Based on the financial literature daEsii), we assume that the volatility process 
is long-range dependent, and the model we focus on is the long memory stochastic volatility 
model in discrete time that is described by 0. This model was introduced simultaneously 
by Breidt et al. [6] and Harvey, [26] in 1993. 
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Time Time 

(a) Number of Particles ^"=500 (b) Number of Particles A^=2,500 

Figure 3: Convergence of the estimated parameter as a function of time, for two different 
choices of number of particles. 




(a) Number of Particles ^"=500 (b) Number of Particles A^=2,500 

Figure 4: Empirical Variance of the estimated parameters as a function of time for N=500 
and N=2,500 


To further specify this model in practice, we need to determine the order of the polyno¬ 
mials (/>(•) and 0(-}. This is a common task in time series analysis and for details we refer 
to Hamilton [21]. Based on a preliminary analysis, we choose to work with a Fractional 
ARIMA(l,d, 1) model, which is also in accordance to the model suggested by [T]. To be 
precise, the model we will be working with is 

I (f) €t 

\ (1 - <pB) (1 - B)^ Xt = ^ m-i + Vt, 

where cr{x) = logx. 

The data set we consider contains daily returns of the S&P 500 for one year, that is 
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about 252 observations, starting in January 2010 until December 2010. 

One assumption that we made in the SMC algorithm is that the parameter d is known. 
However, when it comes to real data, this is something that we need to estimate. Here, we 
used the Geweke and Porter-Hudak estimator, [7], which yields d = 0.2. Then, we apply 
the SISR algorithm to estimate the remaining unknown parameters of the model ip and rl. 
We choose as tuning parameters 6 = 0.986 which then implies {a,h) = (0.965,0.068). 

The algorithm has two outputs. The first one is the distribution of the unobserved 
volatility, which is given in Figure using 500 trajectories, and the second one is the 
estimated vector of parameters, which are plotted as a function of time in Figure]^ 





Figure 5: Histogram of the empirical distribution of the unobserved volatility. As a refer¬ 
ence, the implied volatility for the same period was 0.355. 

Using the model parameters we estimated, we also do an out-of-sample prediction of 
the values of the underlying asset, which is shown in Figure [7| By doing an one-step 
prediction each time, we forecast 20 daily values of the index. The 95% conhdence intervals 
are computed using boostrap. In Figure we also present the empirical variance for the 
estimators (p and 6. In Figure we also present the residuals of the fitted model. 


6 Conclusion 

To summarize, in this article we extended the standard SISR algorithm to incorporate the 
case that the observations are long-range dependent. Our findings show that the results 
are very close to the case that the observations are independent or Markov. However, 
the main drawback of this method is the computational time that is required to perform 
the iterations. Since we need to take into account, and technically speaking to store all 
past values of the trajectory, this increases the computational time and complexity of the 
method. In addition, by naturally extending existing results in the literature, we proved 
that the filter converges to the true distribution of the unobserved process. 

Our second outcome, was the development of an SISR algorithm that along with the 
estimation of the unobserved distribution of the hidden process, also estimated unknown 
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(a) Estimator of ip 


(b) Estimator of 



(c) Empirical Variance of (p 


(d) Empirical Variance of 


Eigure 6: Estimators for ip and 'd for the S&P 500 data set. Solid lines represent the 
estimated values for p = 0.842 and 6 = 0.01 for cj) and 9 respectively. 


model parameters. Our approach was dynamic, in the sense that the parameter was re¬ 
garded as “time-varying” and thus the parameter estimators were updated at every step of 
the algorithm. We also showed that the proposed estimators for the unknown parameter are 
consistent and asymptotically normal and we corroborated these results with a simulation 
study. 

There are quite a few open problems that we would like to investigate in the future. 
The first one is to study ways to improve the computational efficiency of the algorithm. 
In our approach, we used all the history of the trajectory to run the algorithm, which 
severely affected the computational efficiency, but it would be interesting to investigate if 
a “window” approach would provide us with a reasonable estimator for the filter and/or 
the parameter, and possibly quantify the loss that one might have by doing so in terms of 
accuracy. 

In addition, one question that we did not address in this paper, is what happens with the 
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Figure 7: Out-of-Sample Prediction of the S&: P 500 values. The (black) solid continuous 
line are the true S& P 500 values, while the (red) solid step line is our estimation. The 
(red) dotted lines form the 95% prediction interval. 



(a) Standard Residuals (b) ACF of the Residuals 

Figure 8: Residuals of the fitted fractional ARIMA (1, 0.2, 1) model with (/? = 0.842 and 

d = 0.01. 


long memory parameter in practice. In our approach, we assumed that d (or equivalently 
H) is known (given or estimated from the data). However, it is an open question how 
one would consistently estimate the memory parameter in the scenario that the long-range 
dependent process is hidden. 

The goal of this first paper on the topic is to lay down the algorithm, its properties 
and to understand the main issues that the presence of memory brings. In this paper, we 
formulated the algorithm and established some baseline theoretical properties. We also 
performed numerical experiments using both simulation data and real data, in order to test 
the performance of the proposed algorithm in practice. We plan to investigate the open 
computational and theoretical issues mentioned above in a systematic way in future works. 
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